MIXED_LINEAR_MODEL

Overview

The MIXED_LINEAR_MODEL function fits a Linear Mixed Effects Model (LMM), also known as a multilevel or hierarchical linear model, for analyzing data with grouped or nested structures. These models are essential when observations are not independent, such as in longitudinal studies, repeated measures designs, or when data are naturally clustered (e.g., students within schools, patients within clinics).

Unlike standard linear regression, which assumes all observations are independent, linear mixed models account for correlation within groups by incorporating both fixed effects (population-level parameters shared across all groups) and random effects (group-specific deviations from the population mean). This implementation uses the MixedLM class from the statsmodels library.

The general model for group i can be expressed as:

Y_i = X_i\beta + Z_i\gamma_i + \epsilon_i

where Y_i is the response vector, X_i is the fixed effects design matrix, \beta contains the fixed effects coefficients, Z_i is the random effects design matrix, \gamma_i \sim N(0, \Psi) is the random effects vector specific to group i, and \epsilon_i \sim N(0, \sigma^2 I) represents the residual errors.

The function supports random intercept models (when x_random is omitted), which allow each group to have its own baseline level, and random slope models (when x_random is provided), which allow both intercepts and covariate slopes to vary across groups. Parameters are estimated using Maximum Likelihood (ML) estimation, with the implementation following the Newton-Raphson and EM algorithms described by Lindstrom and Bates (1988).

The function returns fixed effects coefficients with standard errors, z-statistics, p-values, and confidence intervals, along with random effects variance components, log-likelihood, AIC, and BIC for model comparison. For more details, see the statsmodels Linear Mixed Effects Models documentation.

This example function is provided as-is without any representation of accuracy.

Excel Usage

=MIXED_LINEAR_MODEL(y, x, groups, x_random, fit_intercept, alpha)

y (list[list], required): Column vector of dependent variable values (numeric).
x (list[list], required): Matrix of fixed effects predictors. Each column is a predictor.
groups (list[list], required): Column vector of group/cluster membership identifiers (numeric, integer coded).
x_random (list[list], optional, default: null): Matrix of random effects predictors. If omitted, uses random intercept only.
fit_intercept (bool, optional, default: true): Whether to include a fixed intercept term.
alpha (float, optional, default: 0.05): Significance level for confidence intervals (between 0 and 1).

Returns (list[list]): 2D list with mixed model results, or error message string.

Examples

Example 1: Linear mixed model with random intercept

Inputs:

y	x	groups
1	1	1
2	2	1
3	3	1
4	1	2
5	2	2
6	3	2

Excel formula:

=MIXED_LINEAR_MODEL({1;2;3;4;5;6}, {1;2;3;1;2;3}, {1;1;1;2;2;2})

Expected output:

"non-error"

Example 2: Linear mixed model with three groups

Inputs:

y	x	groups
2	1	1
4	2	1
6	3	1
3	1	2
5	2	2
7	3	2
4	1	3
6	2	3
8	3	3

Excel formula:

=MIXED_LINEAR_MODEL({2;4;6;3;5;7;4;6;8}, {1;2;3;1;2;3;1;2;3}, {1;1;1;2;2;2;3;3;3})

Expected output:

"non-error"

Example 3: Linear mixed model with 90% confidence intervals

Inputs:

y	x	groups	alpha
10	1	1	0.1
12	2	1
14	3	1
11	1	2
13	2	2
15	3	2

Excel formula:

=MIXED_LINEAR_MODEL({10;12;14;11;13;15}, {1;2;3;1;2;3}, {1;1;1;2;2;2}, 0.1)

Expected output:

"non-error"

Example 4: Linear mixed model with larger dataset

Inputs:

y	x	groups
5	1	1
7	2	1
9	3	1
11	4	1
6	1	2
8	2	2
10	3	2
12	4	2

Excel formula:

=MIXED_LINEAR_MODEL({5;7;9;11;6;8;10;12}, {1;2;3;4;1;2;3;4}, {1;1;1;1;2;2;2;2})

Expected output:

"non-error"

Python Code

import math
import warnings

import numpy as np
import pandas as pd
from statsmodels.regression.mixed_linear_model import MixedLM as statsmodels_MixedLM

def mixed_linear_model(y, x, groups, x_random=None, fit_intercept=True, alpha=0.05):
    """
    Fits a Linear Mixed Effects Model (LMM) with random intercepts and slopes.

    See: https://www.statsmodels.org/stable/generated/statsmodels.regression.mixed_linear_model.MixedLM.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        y (list[list]): Column vector of dependent variable values (numeric).
        x (list[list]): Matrix of fixed effects predictors. Each column is a predictor.
        groups (list[list]): Column vector of group/cluster membership identifiers (numeric, integer coded).
        x_random (list[list], optional): Matrix of random effects predictors. If omitted, uses random intercept only. Default is None.
        fit_intercept (bool, optional): Whether to include a fixed intercept term. Default is True.
        alpha (float, optional): Significance level for confidence intervals (between 0 and 1). Default is 0.05.

    Returns:
        list[list]: 2D list with mixed model results, or error message string.
    """
    def to2d(arr):
        return [[arr]] if not isinstance(arr, list) else arr

    def validate_2d_numeric(arr, name):
        if not isinstance(arr, list):
            return f"Invalid input: {name} must be a 2D list."
        if not arr:
            return f"Invalid input: {name} cannot be empty."
        for i, row in enumerate(arr):
            if not isinstance(row, list):
                return f"Invalid input: {name} row {i} must be a list."
            if not row:
                return f"Invalid input: {name} row {i} cannot be empty."
            for j, val in enumerate(row):
                if not isinstance(val, (int, float)):
                    return f"Invalid input: {name}[{i}][{j}] must be numeric."
                if math.isnan(val) or math.isinf(val):
                    return f"Invalid input: {name}[{i}][{j}] must be finite."
        return None

    # Normalize inputs to 2D lists
    y = to2d(y)
    x = to2d(x)
    groups = to2d(groups)
    if x_random is not None:
        x_random = to2d(x_random)

    # Validate inputs
    err = validate_2d_numeric(y, "y")
    if err:
        return err
    err = validate_2d_numeric(x, "x")
    if err:
        return err
    err = validate_2d_numeric(groups, "groups")
    if err:
        return err
    if x_random is not None:
        err = validate_2d_numeric(x_random, "x_random")
        if err:
            return err

    # Check dimensions
    n_obs = len(y)
    if len(x) != n_obs:
        return "Invalid input: y and x must have the same number of rows."
    if len(groups) != n_obs:
        return "Invalid input: y and groups must have the same number of rows."
    if x_random is not None and len(x_random) != n_obs:
        return "Invalid input: y and x_random must have the same number of rows."

    # Check column consistency
    n_y_cols = len(y[0])
    for i, row in enumerate(y):
        if len(row) != n_y_cols:
            return f"Invalid input: y row {i} has {len(row)} columns, expected {n_y_cols}."
    if n_y_cols != 1:
        return "Invalid input: y must be a column vector (single column)."

    n_x_cols = len(x[0])
    for i, row in enumerate(x):
        if len(row) != n_x_cols:
            return f"Invalid input: x row {i} has {len(row)} columns, expected {n_x_cols}."

    n_groups_cols = len(groups[0])
    for i, row in enumerate(groups):
        if len(row) != n_groups_cols:
            return f"Invalid input: groups row {i} has {len(row)} columns, expected {n_groups_cols}."
    if n_groups_cols != 1:
        return "Invalid input: groups must be a column vector (single column)."

    if x_random is not None:
        n_xr_cols = len(x_random[0])
        for i, row in enumerate(x_random):
            if len(row) != n_xr_cols:
                return f"Invalid input: x_random row {i} has {len(row)} columns, expected {n_xr_cols}."

    # Validate scalar parameters
    if not isinstance(fit_intercept, bool):
        return "Invalid input: fit_intercept must be a boolean."
    if not isinstance(alpha, (int, float)):
        return "Invalid input: alpha must be numeric."
    if math.isnan(alpha) or math.isinf(alpha):
        return "Invalid input: alpha must be finite."
    if alpha <= 0 or alpha >= 1:
        return "Invalid input: alpha must be between 0 and 1."

    # Extract flat arrays
    y_flat = np.array([row[0] for row in y])
    groups_flat = np.array([row[0] for row in groups])

    # Build DataFrame for fixed effects
    # Transpose x: convert from list of rows to dict of columns
    x_dict = {}
    if fit_intercept:
        x_dict['Intercept'] = [1.0] * n_obs
    for col_idx in range(len(x[0])):
        x_dict[col_idx] = [row[col_idx] for row in x]
    x_df = pd.DataFrame(x_dict)

    # Build random effects structure
    if x_random is None:
        # Random intercept only - use formula interface
        exog_re = None
    else:
        xr_dict = {}
        for col_idx in range(len(x_random[0])):
            xr_dict[col_idx] = [row[col_idx] for row in x_random]
        exog_re = pd.DataFrame(xr_dict)

    # Fit the model
    try:
        with warnings.catch_warnings():
            warnings.filterwarnings('ignore')
            model = statsmodels_MixedLM(
                endog=y_flat,
                exog=x_df,
                groups=groups_flat,
                exog_re=exog_re,
                missing='drop'
            )
            result = model.fit(reml=False)
    except Exception as e:
        return f"statsmodels.regression.mixed_linear_model.MixedLM error: {e}"

    # Check for convergence issues
    if hasattr(result, 'converged') and not result.converged:
        return "statsmodels.regression.mixed_linear_model.MixedLM error: Model did not converge."

    # Extract fixed effects results
    fe_params = result.fe_params
    bse = result.bse
    tvalues = result.tvalues
    pvalues = result.pvalues
    conf_int = result.conf_int(alpha=alpha)

    # Validate outputs don't contain NaN or inf
    for val in [fe_params, bse, tvalues, pvalues]:
        if hasattr(val, 'values'):
            arr = val.values
        else:
            arr = np.array(val)
        if np.any(np.isnan(arr)) or np.any(np.isinf(arr)):
            return "statsmodels.regression.mixed_linear_model.MixedLM error: Model produced non-finite values."

    output = []

    # Section 1: Fixed effects (only the fixed parameters, not random variance)
    output.append(['fixed_effects', 'parameter', 'coefficient', 'std_error', 'z_statistic', 'p_value', 'ci_lower', 'ci_upper'])

    # Add all fixed effect parameters
    for i in range(len(fe_params)):
        param_name = str(fe_params.index[i])
        output.append([
            'fixed_effects',
            param_name,
            float(fe_params.iloc[i]),
            float(bse.iloc[i]),
            float(tvalues.iloc[i]),
            float(pvalues.iloc[i]),
            float(conf_int.iloc[i, 0]),
            float(conf_int.iloc[i, 1])
        ])

    # Section 2: Random effects variance components (padded to 8 columns)
    output.append(['random_effects', 'parameter', 'variance', 'std_dev', None, None, None, None])

    # Extract random effects covariance (returns DataFrame)
    cov_re = result.cov_re

    # Check if it's a DataFrame or other structure
    if hasattr(cov_re, 'values'):
        cov_re_array = cov_re.values
        # Check for NaN/inf in random effects
        if np.any(np.isnan(cov_re_array)) or np.any(np.isinf(cov_re_array)):
            return "statsmodels.regression.mixed_linear_model.MixedLM error: Random effects contain non-finite values."
        # Get diagonal elements (variances)
        if cov_re_array.ndim == 2:
            for i in range(cov_re_array.shape[0]):
                var = float(cov_re_array[i, i])
                if i < len(cov_re.index):
                    param_name = str(cov_re.index[i])
                else:
                    param_name = f'Random[{i}]'
                output.append(['', param_name, var, math.sqrt(var), None, None, None, None])
        elif cov_re_array.ndim == 0:
            # Scalar case
            var = float(cov_re_array)
            output.append(['', 'Group Var', var, math.sqrt(var), None, None, None, None])
        else:
            # 1D array
            var = float(cov_re_array[0])
            output.append(['', 'Group Var', var, math.sqrt(var), None, None, None, None])
    else:
        # Fallback for other types
        try:
            var = float(cov_re)
            if math.isnan(var) or math.isinf(var):
                return "statsmodels.regression.mixed_linear_model.MixedLM error: Random effects contain non-finite values."
            output.append(['', 'Group Var', var, math.sqrt(var), None, None, None, None])
        except:
            return "statsmodels.regression.mixed_linear_model.MixedLM error: Unable to extract random effects."

    # Model statistics (padded to 8 columns)
    llf = float(result.llf)
    aic_val = float(result.aic)
    bic_val = float(result.bic)
    scale_val = float(result.scale)

    # Validate model statistics
    if math.isnan(llf) or math.isinf(llf):
        return "statsmodels.regression.mixed_linear_model.MixedLM error: Log-likelihood is non-finite."
    if math.isnan(aic_val) or math.isinf(aic_val):
        return "statsmodels.regression.mixed_linear_model.MixedLM error: AIC is non-finite."
    if math.isnan(bic_val) or math.isinf(bic_val):
        return "statsmodels.regression.mixed_linear_model.MixedLM error: BIC is non-finite."

    output.append(['log_likelihood', llf, None, None, None, None, None, None])
    output.append(['aic', aic_val, None, None, None, None, None, None])
    output.append(['bic', bic_val, None, None, None, None, None, None])
    output.append(['scale', scale_val, None, None, None, None, None, None])

    return output

Overview

Excel Usage

Examples

Python Code

Online Calculator